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In this work, charge transport (CT) properties of the 
p53 gene are numerically studied by the transfer matrix 
method, and using either single or double strand effective 
tight-binding models. A statistical analysis of the conse- 
quences of known p53 point mutations on CT features is 
performed. 



It is found that in contrast to other kind of mutation 
defects, cancerous mutations result in much weaker 
changes of CT efficiency. Given the envisioned role 
played by CT in the DNA-repairing mechanism, our the- 
oretical results suggest an underlying physical explana- 
tion at the origin of carcinogenesis. 
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1 Introduction The electronic transmission properties 
of DNA molecules are believed to play a critical role in 
many physical phenomena taking place in the living or- 
ganisms |[T]|2][3]|4l . For instance, it is believed that charge 
transfer (CT) through DNA is inhibited at the damaged 
sites of the sequence, owing to misalignements of base pair 
TT-stacking. Similarly, base excision repair (BER) enzymes 
such as Endonuclease III and MutY are suggested to ef- 
ficiently locate the DNA base lesions or mismatches by 
probing the DNA-mediated CT ^Mj^]- 

Besides, given that the development of cancers is closely 
related to the DNA damage/repair mechanism [Sj, the mod- 
ifications of CT properties when mutations start to develop 
is therefore an important question to deepen. A most im- 
portant gene in cancer research is p53 also known as the 
"guardian of the genome" @. Indeed, p53 encodes the tu- 
mor suppressor TP53 protein that suppresses the tumor 
development by activating the DNA repair mechanisms or 
the cell apoptosis process if DNA reparation is impossible. 
There are 20303 base pairs in the p53 sequence (NCBI ac- 
cess number X54156). More than 50% of human cancers 
are related to the mutations of the p53 gene which usu- 
ally jeopardize the efficient activity of TP53 flOl. Most of 
the cancerous mutations are point mutations — a base pair 



substituted by another — with distributions along the DNA 
sequence that are highly non-uniform IfTTI . Each point mu- 
tation can be described by two parameters {k, s), respec- 
tively giving the mutation position k on the sequence and 
the nucleotide type s (either A, C, G, or T) substituting the 
original one. The most frequent mutation locations found 
in the cancer cells are named mutation "hotspots". From 
the International Agency for Research on Cancer (lARC) 
database ifTTl . it is found that most hotspots of p53 are lo- 
cated in the exons 5 ~ 8 in the interval from the 13055th 
to the 14588th nucleotide. The 13203th base pair has the 
highest frequency of occurrence (1055 times) and more 
than 80% of the total 23544 cases in the database occur 
on 1% of the base pairs of the p53. The mutation (fc, ,s) is 
said to be "cancerous" ("noncancerous") if it is (not) found 
in the lARC database. 

In this paper, the effects of all possible point mutations 
on CT are studied for the p53 gene using appropriate tight- 
binding models and energy parameters which are know to 
reproduce experimental results or first principle calcula- 
tions II12II13I . We find that anomalously small changes of 
CT efficiency modulations coincide with cancerous muta- 
tions. In contrast, non-cancerous mutations result, on av- 
erage, in much larger changes of the CT properties. From 
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this analysis, we propose a new scenario for understanding 
the underlying origin of how cancerous mutations shortcut 
the DNA damage/repair processes. 

2 Models for charge transport in DNA The generic 
form of the simple but physically sounding 1 -channel model 
of coherent hole transport of DNA is given by an effective 
tight-binding Hamiltonian (the "fishbone model" (FB)) |[T2l 



FB 



■i=l q=hl 

+£i\i){i\ 4 



-U\t){i + l\-fl\i,q){i\ 



(1) 



where each lattice point stands for a nucleotide base pair 
of the chain for i = 1, L. ti is the hopping ampli- 
tude between ith and i + 1th base pairs and Si is the on- 
site potential of the zth base pair, with q =t, | is the 
hopping amplitude between the ith base pair and its neigh- 
boring (upper and lower) backbone sites \i,q). The onsite 
energy at the sites \i,q) is given by ej. The model will be 
reduced to the simplest one-ladder (IL) model if the sugar- 
phosphate backbone sites \i,q) of DNA are absent, that is, 
tl ^ el ^ ll4llT5]fT6lfT7l . This one-channel model is 
shown schematically in Fig.[TJa). 

To account for the full double-strand nature of DNA, an 
alternative two-channel ladder model (LM) shown in Fig. 
[itb) is also used. The corresponding Hamiltonian is given 

as im 



E 

.T = l,2 



(ii,T-|z,r)(i + 1,t| + e^^^ji, t)(z, 



+ E ( E n\hr){^,q{r)\ 
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where ti i- is the hopping amplitude between the sites along 
each branch t = 1, 2 and ei,r is the corresponding onsite 
energy. ti2 represents the hopping between the nucleotides 
of each base pair. Again, the model will be reduced to a 
two-leg (2L) model if the backbone sites are not taken into 
account lfT9ll20ll2i1 . 

The onsite energies for each base are chosen according 
to the ionization energies , eA = 8.24eV, ec = 8.87eV, 
EG = 7.75eV and ex = 9.14eV l22l|23l|24l|25l|26l for each 
model. For model IL, the hopping term between pairs base 
are all set as i„ = 0.4 eV. Other values ranging from 0.1 
to 1 eV are also used to investigate the robustness of our 
conclusion. For model FB, t„ is 0.4 eV as in IL. The addi- 
tional hopping terms linking to the backbone are taken as 
0.7eV, whereas all backbone onsite energies are assumed 
to be 8.5eV, roughly equal to the average value of all on- 
site energies for the base pairs. The hopping terms in model 
2L between the same kind of base pairs (AT/ AT, GC/GC, 
etc.) are chosen as 0.35eV, and O.lZeV otherwise 1121 . In- 
terchain coupling constant is fixed to t± ~ O.leV. Last, in 
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(a) 



(b) 



Figure 1 Schematic models for hole transport in DNA. 
The nucleobases are given as (grey) circles. Electronic 
pathways are shown as lines, and dashed lines and circles 
denote the sugar-phosphate backbone. Graph (a) shows 
effective models IL and FB (with dashed backbone) for 
transport along a single channel, whereas graph (b) depicts 
possible two-channel transport models 2L and LM (with 
dashed backbone). 



model LM, intrachain and interchain hopping strengths are 
taken as in the two-leg model 2L. Additionally, the back- 
bone energetics is treated as in the fishbone model case. 

3 Method The most convenient method to evaluate the 
transport properties of these quasi-one-dimensional tight- 
binding models is known as the transfer matrix method 
(TMM) |,27„28,29,„30,,311. This approach allows to deter- 
mine the hole transmission coefficient T{E) in systems 
with varying cross section AI and length L 3> M. In brief, 
the eigenstates = V'ril'^) (here |7i) denotes the ?ith 
site position of the hole) of the Hamiltonian are computed 

from {ipL,ilJL-i)'^ = TL ■ (V'IjV'o)^ where tl{E) is the 
global transfer matrix 1301 . E is the energy of the injected 
carrier The localization lengths are deduced from the scal- 
ing analysis of T{E), whatever the used effective model 
l|27||28lE91[30l . Besides, when assuming that the DNA se- 
quences are connected to the semi-infinite metallic elec- 
trodes 121, T{E) takes the following analytical form lT6l 
[32l|33][3l[35l 



T{E) 



4-E' 



P +2 - E^TiiT22 + E{tii - T22)(ti2 - T21) 

(3) 

with E = {E - e,n)/to and P = J2ij=i,2'^fj- ^rn and 
to are the onsite energies and the hopping integral of the 
electrode energetics, respectively. It is readily shown that 



with 



Til Tl2 
T2I T22 



Mr, 



MlMl-i...AI2M, 



1 




(4) 



(5) 



where e„ = e„ for IL and e„ — 22q T'TTe ^'^^ 
spectively ITSl . In the following — £g = 7.75 eV and 
to = 1 eV. 
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To analyze the position-dependent transport properties 
of p53 gene and the effect of point mutations, let us define 
S = (si, S2, . . . , S20303) as a finite-length sequence of the 
p53 gene. Sjl is a segment of S starting from the 7th base 
pair with length L, that satisfies SjL{n) = S{n + j — 1) 
with n = 1, 2, • • • , L. The transmission coefficient as a 
function of energy is denoted as TjL{E). CT for the jth 
site with propagation length L is defined as the averaged 
value of the integrated TjL{E) (for all incident energies) 
of all L possible subsequences of p53 containing the jth 
site and with length L 



n=j-L + l 



1 



El - Eq 



Tn,LiE)dE. (6) 



where n is further restricted to 1 < n < 20304 — L close 
to the boundaries; Eq and Ei denote a suitable energy win- 
dow which we shall normally choose to equal the extrema 
of the energy spectrum for each model, i.e. [6.5, 10.5] for 
model IL, [7.5, 10.5] for 2L, [8, 9.5] for FB and [5, 15] for 
LM. 

If the kth base on the p53 sequence is mutated from Sk 
to s and j < k < j + L — 1 {i. e., the mutated site belongs 
to the segment Sjl), the mutated sequence will be denoted 



as S^l- 



and Sflii ^ fc - .7 + 1) 



SjL{i)- The transmission coefficients of the original and 
mutated sequences are denoted as Tj^i^E) and T^l{E), re- 
spectively. The squared difference of the transmission co- 
efficient between the wild and mutated sequences is de- 
fined as 



A%{E) 



Tf[{E)\ 



(7) 



And Z\^£ [E) is then summed for all incident energy E as 



" E, 



E, 



JEo 



dEA%{E). 



(8) 



Finally, 4*^£ is averaged over all segments with length L 
containing the mutation site (fc), to give the average effect 
of the mutation (fc, s) on the change of CT for ph'i 



r{k,s,L) = - 



1 V Z'^f . 



(9) 



4 Results and discussion The 14585th base (exon 
8, codon 306) of the p53 sequence is found 133 times in the 
lARC database that mutates from C to T and causes vari- 
ous types of cancer |:36|. On the other hand, the mutations 
C — > G and C ^ A are said to be noncancerous since they 
are never found in cancer cells. The effects of the cancer- 
T) and noncancerous mutations 2^14570 31 (£^) 

.14585,5 
^4570,31 



ous (C 

and A]'^'^m'ji{E) for the models FB and 2L are shown in 




Figure 2 Energy-dependence of logarithmic transmission 
coefficients 7^^4570 31 {E) of the original sequence (C solid 
line) and mutated (A dotted, G dotted-dashed, T dashed) 
sequences with length L = 31 (from 14570th to 14500th 
nucleotide) of p53. The left panel shows results for model 
2L, the right two panels denote the two transport windows 
fortheFBmodelHSl. 




Figure 3 Energy-dependence of logarithmic squared dif- 
ferences Z\J457q'3]^(£') between the transmission coeffi- 
cients of the original sequence and mutated (C T solid 
line, A dotted, G dotted-dashed) sequences. The left 
panel shows results for model 2L, the right panels denote 
model FB. 



121 [3] respectively. The overall effect of these three muta- 
tions r(14585, s, L = 20, . . . , 100) for all the 4 models is 
given in Table [U 

It is clear from Table [T] that for many cases the CT 
change due to cancerous mutation is much smaller than 
noncancerous mutations. These results are stable over a 
wide range of L and model parameters. This suggests a 
scenario to understand how specific mutation hotspots could 
be robust against repair mechanism, and trigger carcino- 
genesis. Experimentally, the BER enzymes can locate the 
damaged sites on DNA by probing the CT of the segment 
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Table 1 Renormalized values of the energy-averaged 
vv.... ^ - 11,21, 101 and j = 



changes Aff^"^^' with L 



12 3 

r(xio-^) 



1.5 



r(xio-') 



Figure 4 r{k,s,L) for IL model with (a) (t„,i) = 
(0.1,20), (b) (0.4,80), and (c) (1.0,140) for all cancer- 
ous point mutations of p53 and their frequencies found in 
the lARC database. 



bound by the enzymes 0!2|. If a mutation only weakly 
changes the CT, the enzymes will not be able to find it and 
the repair mechanism will not be activated. Such mutations 
will survive DNA repair mechanims and yield cancers. In 
contrast, those mutations that strongly affect CT could be 
more easily detected by the CT probing mechanim of en- 
zymes and therefore repaired. 

The results presented thus far are for a particular hotspot. 
To further challenge this scenario, many more hotspots of 
the p53 gene have been analyzed. We have thus calculated 
A{k, s, L) for 14 hotspots with the highest mutation fre- 
quencies and for L up to 160 in all 4 models. The results 
show that the qualitative behavior of each hotspot for all 
models is similar. Thus the following analysis is performed 
on all hotspots for the IL model. Fig.Ub) shows the cor- 
relation between frequency found in the cancer cells and 
the CT change r{k, s, L = 80) with t„ = 0.4 eV. It is 
clear that the hotspots with highest frequencies correspond 
to smaller F. Thus the correlation observed in Fig. [2] for 
the 14585th site is common for most of the hotspots. Fig. 
Ufa) and (c) show similar behaviors for <„ =0.1 and 1 eV, 
respectively. The scenario is thus found to be robust for a 
wide range of t„. 

5 Conclusion The CT modifications due to all possi- 
ble point mutations of the p53 tumor suppressor gene have 
been analyzed by TMM together with statistical methods. 
The results show that on average the cancerous mutations 
of the gene yield smaller changes of the CT in contrast 
with non-cancerous mutations. The tendency is valid for 
the 4 studied tight-binding models (IL, FB, 2L, and LM) 
and is robust for a wide range of the hopping integral t„ 
(0.1 ~ 1.0 eV). 

These results suggest a possible scenario of how can- 
cerous mutations might circumvent the DNA damage-repair 
mechanism and survive to yield carcinogenesis. However, 
our analysis is only valid in a statistical sense and we do ob- 
serve occasional non-cancerous mutations with weak chan- 



14585 — (i — l)/2 in transmission properties for the 4 
tight-binding models. All data are shown with at most 3 
significant figures. Common multiplication factors for each 
group of data for given L and mutations with C ^ A, G 
and T are suppressed. Bold entries denote minima for the 
CT change of C ^ T. 



s 


L 


IL 


FB 


2L 


LM 


C ^ A 


11 


71.2 


1.276 


4.84 


3.24 


C 


11 


113 


0.164 


5.70 


5.40 


C 


11 


16.3 


0.013 


1.62 


0.29 


C A 


21 


16.4 


7.58 


2.70 


5.31 


C 


21 


30.5 


1.08 


5.52 


44.3 


C 


21 


3.23 


0.19 


0.18 


3.73 


C -> A 


31 


15.7 


548 


14.6 


13.9 


C 


31 


21.4 


5459 


5.18 


0.55 


C 


31 


9.14 


0.63 


3.60 


0.23 


C -* A 


41 


1.16 


30.7 


0.52 


3.66 


C ->G 


41 


2.21 


0.72 


2.99 


5.23 


C ->T 


41 


0.40 


0.009 


1.36 


1.17 


C ^ A 


51 


0.56 


232 


1.60 


1.00 


C ^G 


51 


0.90 


0.21 


41.7 


2527 


C 


51 


0.71 


0.13 


3.36 


9.56 


C ^ A 


61 


0.84 


3160 


1.50 


0.70 


C ^G 


61 


2581 


2.95 


1.26 


9.01 


C 


61 


1.29 


1.84 


14.4 


99.0 


C^A 


71 


0.99 


3187 


1.48 


0.12 


G ~>G 


71 


9.03 


0.29 


1.45 


0.91 


G 


71 


4.59 


0.19 


1.47 


18.4 


C A 


81 


3.61 


3939 


5.53 


3.19 


G ^G 


81 


237 


3.61 


5.49 


5.48 


G 


81 


0.14 


2.30 


5.40 


0.90 


C A 


91 


1.06 


1183 


10.1 


11.5 


G ^G 


91 


232 


1.1 


92.6 


60.4 


G 


91 


2.95 


0.69 


0.32 


0.47 


C ^ A 


101 


1.63 


9143 


199.1 


102.2 


G ~*G 


101 


1044 


8.68 


820.1 


493.3 


G 


101 


8.64 


5.33 


0.1 


0.1 



ge of CT. For these, other DNA repair processes should ex- 
ist and we therefore do not intend to claim that the DNA- 
damage repair solely uses a CT-based criterion. Still, our 
results exhibit an intriguing and new correlation between 
the electronic structure of DNA hotspots and the DNA da- 
mage-repair process. 

One notes that to further support the abovementionned 
scenario, additional complexities of the DNA energetics 
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should also be considered. This includes to investigate the 
role of electron-phonon coupling, polaronic transport, more 
detailed sequence-dependent energetics such as two-strand 
couplings, electronic correlations, as well as metal/DNA 
contact interactions ||37l[T8l[T9ll20linil38l[39ll40ETBll43^ . 
Ultimately, experimental studies of short strands of wild 
and mutated subsequences of the p53 gene should be per- 
formed to challenge our theory. The lengths scales of DNA 
required to unveil our mechanism are already within the 
scope of experimental measurements 1441145 Il46ll47l . 
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